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ABSTRACT 

We see an increasing demand for in-the-cloud middlebox 
processing as applications and enterprises want their cloud 
deployments to leverage the same benefits that such services 
offer in traditional deployments. Unfortunately, today's cloud 
middlebox deployments lack the same abstractions for flex- 
ible deployment and elastic scaling that have been instru- 
mental to the adoption and success of cloud-based compute 
and storage services. The key challenge here is that such net- 
work processing workloads are fundamentally different from 
traditional virtualized compute and storage services. These 
differences arise as a consequence of the ways in which ten- 
ants need to compose different middlebox services and the 
network-level factors (e.g., placement, load balancing) that 
impact application performance. 

To address this challenge, we present the design and im- 
plementation of Stratos. Stratos allows tenants to specify 
logical middlebox deployments and provides efficient scal- 
ing, placement, and distribution algorithms that abstract away 
low-level issues in ensuring effective application performance. 
We demonstrate the effectiveness of Stratos using an experi- 
mental prototype, a limited deployment over EC2, and large- 
scale simulations. 



,^ 1. INTRODUCTION 



Surveys show that enterprises rely heavily on in-network 
middleboxes such as load balancers, intrusion prevention sys- 
tems, and WAN optimizers to ensure application security 
and improve performance ll35] l33l . As many of these ap- 
plications and services move to the cloud, enterprises would 
naturally like to leverage the same performance and secu- 
rity benefits in the cloud. This is evidenced by an increas- 
ing number of commercial middlebox vendors providing vir- 
tual appliances ll3l [T2l[T3]| . research prototypes and startups 
proposing in-the-cloud network processing services ll35] l22l 
|9l, and the emergence of similar (albeit limited) offerings 
from cloud providers themselves ||2] . 

The ability to elastically scale deployments to match de- 
mand and to flexibly manage virtual compute and storage 
resources has been a driving factor contributing to the adop- 
tion of cloud deployments. Unfortunately, cloud customers 
today lack similar support and abstractions for their in-the- 
cloud virtual middlebox (MB) deployments. Existing ab- 



stractions treat MBs the same as any other compute nodes, 
leading to brittleness, inflexibility and poor elasticity (©. 

MB deployments are different from traditional virtualized 
compute or storage resources in three respects: 
Composition: MBs are rarely used in isolation. Deploy- 
ments are typically structured as physical or logical chains 
where a given flow/packet is processed by a sequence of het- 
erogeneous MBs that lie on critical forwarding paths. The 
MB processing required may change depending on observed 
traffic patterns. Thus, (/) there needs to be intrinsic support 
for static and dynamic MB composition, and (;;) more im- 
portantly, management functions must consider chain-level 
performance. 

Network-aware scaling: Since MBs are on the data path, 
their network footprint more critically impacts application 
performance compared to virtual compute services. In par- 
ticular, the contention between MB and other network traffic 
can vary dynamically in time and space for complex chains. 
Coupled with MB heterogeneity, variable MB performance 
in virtualized environments, and heavy resource multiplex- 
ing in clouds, this necessitates a new approach to identify 
bottlenecks and make informed horizontal scaling decisions. 
As we show, traditional "network-agnostic" scaling approaches 
based on monitoring CPU/memory do not work even for 
simple MB chains. 

Fine-tuning network interactions: Network bottlenecks can 
hurt MB performance and hence tenant applications. Yet, the 
presence of multiple MBs in a chain provides many useful 
knobs for minimizing the potential for contention between 
MB sourced/destined traffic and other traffic. Tuning these 
knobs is crucial because it helps optimally leverage the pro- 
cessing capacity of MB instances. Aside from extracting 
more out of MBs, this helps: (;) improve the effectiveness 
of the scaling decisions, and (//) support a greater number of 
elastic tenant MB chains at the same or lower cost. 

In essence, providing the management flexibility and hor- 
izontal scalability for MB deployments similar to compute 
and storage services requires designing new cloud network 
functions that explicitly manage the network configuration 
and interactions of MBs. Thus, we design and implement, 
Stratos, a new network-aware orchestration layer for MBs. 

Stratos's configuration plane allows a cloud tenant to flex- 
ibly compose and dynamically alter virtual topologies that 
contain arbitrary MB chains 03. lb . The configuration plane 
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exports an annotated logical topology view to tenants, where 
the annotations are hints on MB network footprint. Stratos's 
management plane implements efficient algorithms to map 
the logical view to an appropriate physical realization. 

Stratos's management plane also automatically and accu- 
rately determines the bottleneck for a tenant's deployment 
using an application-aware heuristic that relies on applica- 
tion reported performance measures (§2). The heuristic im- 
plicitly takes into account MBs' holistic resource consump- 
tion, including compute, memory and the network. 

Finally, Stratos explicitly manages the network interac- 
tions of MBs in order to maximize the network capacity be- 
tween them. Specifically, the management plane implements 
two functions that both take profiles of MB network foot- 
print and logical MB topologies as input: (/) A placement 
algorithm that logically partitions the physical MB topol- 
ogy into per-rack partitions and places them with minimal 
inter-partition communication (©. (//) A traffic distribution 
algorithm to route traffic across the different MBs/replicas 
that further reduces the network footprint (§|6). Placement 
is triggered when a new tenant arrives, scaling decisions are 
made, or network-wide management actions occur (e.g., VM 
migration). Traffic distribution is invoked periodically to 
re-balance traffic based on changing MB network footprint, 
changing network load from other tenants, or a placement 
decision. 

We implement Stratos as a collection of modules running 
atop Floodlight ['61 (~7500 LOC). These modules (/) parse 
tenant chain configuration files, (//) gather performance met- 
rics from network switches, applications, and MBs using 
SNMP, (;7/) execute Stratos's scaling, placement, and flow 
distribution algorithms, (iv) launch and terminate VMs us- 
ing Xen ifTSl . and (v) install forwarding rules in hypervisor- 
resident Open vS witches [8!. 

We conduct controlled experiments of our prototype over 
a 24 node/72 VM data center testbed. We also evaluate a 
stripped down Stratos for EC2 which only implements our 
scaling and load distribution heuristics. Finally, we conduct 
simulations to study Stratos's impact at scale. 

Our central goal is to verify the importance of network- 
awareness embedded into Stratos, be it in scaling, placement 
or distribution in supporting MB services in the most effec- 
tive fashion. To this end, we find: 

• Stratos helps optimally meet application demand by 
accurately identifying bottlenecks and either adding the 
appropriate number of MB replicas, or redistributing 
traffic at coarse and fine timescales to overcome con- 
gestion. 

• Network-agnostic approaches use up to 2X as many 
MBs as Stratos, yet they cannot meet application de- 
mand resulting, in severely backlogged request queues. 

• All three network-aware components of Stratos are cru- 
cial to extracting the ideal overall benefits of Stratos. 

• Even without intrinsic support for placement, Stratos 
can elastically meet the demands of applications in EC2. 




Figure 1 : Example middlebox and server topology 



Stratos imposes little setup overhead. Stratos's fine- 
grained load distribution plays a crucial role in sustain- 
ing application performance despite changing network 
conditions. 

2. BACKGROUND 

MBs play a key role in enterprises and private data cen- 
ters lf33l with application traffic often traversing multiple 
MB appliances. With enterprises migrating their applica- 
tions to the cloud, a wide-variety of cloud-provided services 
(e.g., Amazon's Elastic Load Balancer ||2|) and third-party 
VM images 13] [l2l [T3l have emerged to supply the desired 
MB functionality. In fact, recent surveys show that 87% 
of IT professionals believe that network-level MB services 
should be a key part of Cloud-based laaS offerings ([T] . 

In this section, we describe typical approaches used today 
to leverage MBs in the cloud and show that, due to lack of 
suitable abstractions and intrinsic management functionality, 
these approaches offer limited to no flexibility and impede 
elastic scaling. Our observations are derived on the basis of 
our own experience in trying to deploy such network ser- 
vices in Amazon EC2 f2\. 

Composition: In contrast to traditional compute applica- 
tions, network services are frequently deployed as a "chain" 
of several MBs |f24l|. For example, traffic may enter the 
data center through a WAN optimizer or redundancy elim- 
ination (RE) MB, be mirrored to an intrusion detection sys- 
tem (IDS), directed to a load balancer, and assigned to one 
of several application servers (Figure [T). 

Since today's cloud providers are largely geared toward 
traditional applications, they provide little control over net- 
work topology and routing [|2] [TOl, and third-party overlay 
services lfT4l only faciUtate topologies containing directly 
addressed endpoints (in contrast, MBs should frequently be 
transparent). As a result, tenants are forced to run MBs as 
generic VMs and manually piece together tunnels, traffic 
splitters, and other software to route the desired traffic. Such 
manual and distributed configuration makes it hard to dy- 
namically add new functionality, add replicas MBs to man- 
age load, or route around failed MBs. As an anecdote, im- 
plementing the relatively simple set of MB traversals shown 
in Figure [T]required several days of trial-and-error to obtain 
a working setup in EC2, which relied on several third-party 
tools and configurations strewn across VMs. 

Automation scripts are insufficient since they make dy- 
namic changes possible but not easy. Indeed, the tenant still 
has to implement extra logic - e.g., to distribute appropri- 
ate traffic subsets to MB replicas - which may change when 
new type of MBs are deployed in a chain (e.g., transcoding 
or compression engines which change expected load). More 
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Figure 2: Lack of scaling due to network bottlenecks 
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Figure 3: Ineffective scaling due to poor placement 

importantly, the implemented logic may be fundamentally 
insufficient due to lack of intrinsic support from the cloud 
provider We highlight this next. 

Elastic Scaling: Being on the critical forwarding path, 
MBs' performance and network footprint can significantly 
impact end-to-end application performance. Unfortunately, 
there are no effective schemes today to identify bottlenecks 
in, and elastically scale, MB chains. This is because existing 
approaches ifTTl ID do not recognize the chain as an entity 
and there are no intrinsic mechanisms to help control MB 
chain performance. 

Given today's compute-centric view, tenants could moni- 
tor basic resource consumption (CPU, memory, I/O) to iden- 
tify if individual MBs are bottlenecks. Unfortunately, this 
may not be sufficient because the bottleneck may be a net- 
work link on the path between two MBs in a chain. While 
network bottlenecks also impact regular cloud applications, 
these effects get magnified in the context of MBs because 
they lie on the critical forwarding path. We illustrate this in 
Figure|2] where the IPS and RE MBs run at 50% utilization 
and hence no scaling is triggered. Yet the application's per- 
formance, which is bottlenecked by the congested link, can 
be improved by adding an RE instance (outside rack-2) and 
sending some part of the traffic to it. In general, unless the 
performance constraints imposed by all elements in a chain 
- MBs and network links alike - are taken in account, bottle- 
necks cannot be identified/overcome effectively. 

One of the key reasons that effective elastic MB chain 
scaling is hard today is that cloud providers have no mecha- 
nisms to actively manage the network resources available to 
the chains. For example, it has been shown that EC2's VM 
placement algorithm is essentially random given the instance 
size 1321 . As such, it is quite possible that a new replica 
is launched behind a congested network link in which case 
the bottleneck would not be overcome effectively. We illus- 
trate this in Figure |3] where the IPS instance runs at 80%, 
triggering scaling, but the added replica does not improve 
end-to-end performance because of network congestion at 
the replica's location. 

Furthermore, it is important to allocate the amount of traf- 
fic going to different replicas in a manner that takes prevalent 
network congestion into account, and, equally importantly, 
re-allocate as network conditions (as well as MB load) change; 




Figure 4: Ineffective flow allocation due to lack of visibility 



otherwise, the scaling decision may not have the desired ef- 
fect. This is impossible to do in any effective manner today 
as network utilization information is unavailable to tenants. 
We show this in Figure |4] where N/2 flows are sent over 
the congested inter-rack link. An optimal network-aware so- 
lution in this case would be to only send N/6 flows on the 
congested link. 

3. Stratos OVERVIEW 

Our vision is to enable the same degree of flexibility and 
elasticity that we have with other aspects of cloud computation- 
virtual computing, virtual storage — to in-the-cloud MBs. In 
this section, we start with an overview of our system, Stratos, 
to address this challenge. 

At a high-level, Stratos can be viewed as a network-aware 
orchestration layer layer that enables cloud tenants to eas- 
ily manage MB deployments in the cloud without any of the 
complexity discussed earlier We envision all of the needed 
network-aware functionality to enable flexibility and elastic- 
ity is implemented by the cloud provider 

3.1 Stratos tenant interface 

Instead of composing middlebox and application server 
topologies through a smattering of third-party tools and con- 
figurations, tenants define logical topologies using high-level 
abstractions (Figure |5). These topologies are automatically 
transformed into a set of forwarding rules defining how ap- 
plication traffic flows between server and MB instances. In 
doing so, Stratos abstracts away the physical realization of 
how many and where these MB functions are realized. 

Here, we use the notion of a chain as the basic abstraction 
for describing the direction specific traffic flows should take. 
A chain begins with a source of traffic (e.g., Internet clients), 
contains a sequence of one or more middleboxes the traffic 
should traverse (e.g., IDS and load balancer), and ends with 
a destination (e.g., a set of web servers). Each edge in a 
chain is annotated with an expected traffic gain/drop factor 
that specifies the ratio of input-to-output packets (bytes) on 
each specific middlebox in the chain. For instance, a firewall 
may drop packets and a RE module may compress pack- 
ets on the fly. The traffic gain factors capture these effects 
since they impact the amount of traffic that traverses links 
between MBs. A tenant's topology could contain multiple 
chains with overlapping middleboxes. 

3.2 Stratos internals 
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Tenant logical topology with a single chain 




Tenant logical topology with two overlapping chains 
Figure 5: Example tenant logical topologies 



In mapping this logical view to an actual physical realiza- 
tion, Stratos needs to address three key challenges with each 
addressed by a corresponding Stratos component as shown: 

• Elastic Scaling: How many physical MB instances of 
each type need to be deployed? 

This module takes in as input the logical topology given 
by the cloud tenant, the tenant's current physical con- 
figuration, and any service-level requirement that the 
tenant desires (e.g., upper bounds on cost or lower bounds 
on application latency). It uses periodic measurements 
of the end-to-end application performance to decide 
the optimal number of instances of different middle- 
boxes necessary to meet the given service requirement. 

• Placement: Where should these MBs be placed inside 
the cloud provider's network? 

The placement module takes in as input the current 
state of the cloud provider's physical network topology 
(e.g., available racks, available slots, available band- 
width between racks), the logical topology of the client, 
the current physical instantiation of this topology across 
the provider network, and the number of new MBs of 
different types that need to be initiated. Given these 
inputs, it decides where to place the new MBs to avoid 
network bottlenecks. As a special case, it also imple- 
ments an initial placement interface which starts with 
zero MBs. 

• Flow Distribution: How should the traffic be routed 
through the different MBs? 

The distribution module takes as input a given physi- 
cal instantiation of a tenant chain (i.e., the number and 
placement of the MBs), measured (or statically spec- 
ified) traffic gain/drop factors for the MBs, and the 
current network topology with link utilizations to op- 
timally distribute the processing load between the dif- 
ferent MBs. The goal here is to reduce network con- 
gestion effects for the traffic flowing between MB in- 
stances, as well as balance the CPU/memory utiliza- 
tion of MB instances. 

In designing the individual modules and in integrating them, 
Stratos takes into account both computational loads and network- 
level effects. This helps ensure that the scaling step hones in 
on the true bottlenecks and that good placement and load 
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Figure 6: Overview of the high-level functionality and inter- 
faces to the client and the cloud provider in Stratos to enable 
flexible middlebox deployment in the cloud 

balancing are implemented for the current workload. This 
also ensures that there is sufficient capacity to efficiently add 
new MBs in the future. 

More precisely, when the scaling module decides to in- 
crease the number of MBs, it invokes the network-aware 
placement module to decide where the new MBs need to be 
placed. The placement module in turn calls the flow distri- 
bution module to decide the optimal distribution strategy for 
the chosen placement that takes into account network-level 
effects. As MB network footprints change, the flow distri- 
bution module can redistribute load to further improve the 
chain's end-to-end performance. 

3.3 Interacting with other Provider Functions 

In order to achieve the network-aware orchestration, we 
need new management APIs to facilitate interaction between 
Stratos and existing cloud functions. Specifically, Stratos in- 
teracts with the cloud provider's monitoring and VM deploy- 
ment components as shown by the dotted arrows in Figure |6] 
The interaction occurs at two different timescales (down- 
ward arrows). First, on a coarse-grained timescale Stratos 's 
placement logic may be invoked (left down arrow) whenever 
network-wide management actions occur (e.g., VM migra- 
tion). Second, the monitoring layer periodically reports link 
utiUzations to Stratos's flow distribution module (right down 
arrow). If there is significant change in background (non- 
Stratos) network traffic, the flow distribution module can 
invoke redistributions across tenant chains. Last, Stratos's 
placement logic specifies constraints on the location of new 
MBs at the end of scaling, or that of MBs and application 
VMs at chain initialization time, to the cloud provider's VM 
deployment module (upward dotted arrow). 

The focus of this paper is on the internal logic of Stratos; 
i.e., addressing the challenges highlighted in Section[32] In 
the next three sections, we discuss the algorithmic frame- 
works underlying the above Stratos modules. We do so in a 
top-down fashion, starting with the application-aware scal- 
ing (®, followed by the rack-aware placement (©, and 
the network-aware traffic distribution mechanism (§|6). 

4. ELASTIC SCALING 

The ability to scale capacity as needed is a major benefit 
of deploying applications in the cloud. This means that the 
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chain traversed by application traffic must also be scaled to 
avoid becoming a performance bottleneck. 

To illustrate the difficulty in scaling tenant chains, we start 
by considering several strawman approaches and discuss why 
these solutions are ineffective. Building on the insight that a 
tenant's ultimate concern is the end-to-end application per- 
formance, we design a practical scaling heuristic for elasti- 
cally scaling a tenant's chain. 

4.1 Strawman approaches 

We considered several strawman approaches for deciding 
which MBs to scale, but they turned out to be ineffective: 

1. Scale all MB typesfl The simplest solution for a bot- 
tlenecked chain is to add extra instances for each MB 
type in the chain. This guarantees the bottleneck will 
be eliminated, but it potentially wastes significant re- 
sources and imposes unneeded costs (especially, when 
only one MB is bottleneckedfl 

2. Per-packet processing time: The average per-packet 
processing time at each MB provides a common, middlebox- 
agnostic metric. If a chain is bottlenecked, the MB 
with the greatest increase in per-packet processing time 

is likely the culprit. However, not all MBs follow a one 
packet in, one packet out convention, e.g., a WAN op- 
timizer, and it is unclear if we can calculate a useful 
per-packet processing time in this case. 

3. Offered load: Alternatively, we could leverage CPU 
and memory utilization or other load metrics (e.g., con- 
nections/second). However, different types of MBs 
have different resource or functional bottlenecks |l2T|, 
and these bottlenecks may vary with the workload it- 
self (e.g., a high redundancy workload may stress a RE 
module more). Even if we set this aside, this approach, 
along with #2 and #3 above, is network-agnostic and 
can lead to poor scaling decisions, as we argued in Sec- 
tion |2] 

Another candidate, benchmarking MB throughput offline, 
is also unsuitable since it is based on a fixed traffic mix; a 
change in the traffic mix may cause the MB to bottleneck at 
a rate lower or higher than the benchmarked throughput. In 
Section |8] we use #3 as an example to show that naive ap- 
proaches either identify the wrong bottleneck or take scaling 
decisions that result in using 2X more MBs than needed. 

Ultimately, a tenant is concerned with (/) the performance 
of their applications and (//) the cost of running their de- 
ployments. Together, these motivate the need to scale the 
deployment up/down depending on an application-reported 
performance metric to minimize aggregate cost while ensur- 
ing acceptable performance. Many cloud applications al- 
ready track such metrics for elastic scaUng (e.g., requests 
per second served) and could easily export them to Stratos. 

'"MB type" refers to a specific type of middlebox 

'Unless other specified we us "MB" to refer to a single instance of 

a specific type of middlebox 



scale_up_single(M6oa;Array M) 

1 fori G [0, |M|]: 

3 Do 

2 improves i— False 

4 add_instance(M[j]) 

5 v/SLit{Duration) 

6 foreach app G Apps: 

I if P erf Improvement {app) > thresh: 

5 improves <— True 

6 if improves = False 

8 remove _instance{M\j\) 

9 while improves = True 

Fallback: scale all in chain simultaneously 

scale_multipIe(Bottlenecked Chains): 

10 foreach C S Chains: 

II Overlap <— {}; SharedBottlenecks 4— {} 

12 foreach C ^ C d Chains: 

13 if overlap(C, C): Add C" to Overlap 

14 if Botflenecked(C'): Add C' to SharedBottlenecks 

15 if I Overlap] =0: 

16 scale_up_single(C.mbs) 

17 else if \SharedBottlenecks \ = 0: 

1 8 scale_up_single(unique_mbs( C, Overlap )) 

19 else: 

20 scale_up_single(shared_mbs( C), Overlap ) 
Fallback: scale each chain sequentially 



Figure 7: High-level sketch of the scaling heuristic in 
Stratos. For clarity, we only show the common case opera- 
tion and highlight one possible fall back solution. Note that 
in the multi-chain case, non-overlapping scaling trials can 
be run in parallel. 

4.2 Application-Aware Scaling Heuristic 

We design a heuristic approach that leverages an application- 
reported metric for scaling tenant chains. Our intuitive goal 
here is to ensure that the application SLAs are met, even if 
it means erring on the conservative side and launching a few 
more instances than what is needed optimally. The scaling 
process is triggered by a significant change in the perfor- 
mance of any of the applications in a tenant deployment for 
a sustained period of time (our prototype checks to see if 
there is sustained unmet demand or the average end-to-end 
latency increases by 15% percent over a 30s interval). We 
first describe the scaling process for a single chain and then 
extend it to multiple chains. The latter can be extended in a 
straightforward manner to scaling across multiple tenants. 

Single Chain. Our heuristic performs a set of scaling tri- 
als, scaling each MB type in a tenant-specified chain one 
instance at a time as shown in lines 1-9 in Figure [7] We it- 
erate through the chain and keep an added instances as long 
as we observe an improvement in the applications' perfor- 
mance (in our prototype, we look for a 15% improvement in 
through and unmet load dropping). Note that multiple ap- 
plications could share the same chain; thus, we look for an 
improvement in at least one such application. (As an opti- 
mization, we only need to look for improvement in bottle- 
necked applications.) If we see no improvement, then we 
revert the added instance and move to the next MB type in 
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the chain. The scaUng procedure terminates when we reach 
the end of the chain or we see no more improvement. 

Scale down occurs in a similar fashion except that we look 
for demand drops; our prototype checks if there is no unmet 
demand and the application's throughput drops by a certain 
percentage over a 1 minute interval. Our current prototype 
selects replicas in increasing order of volume served to try 
scaling them down (i.e., removing them). To prevent scale 
up/down oscillations, we use a "damping" factor and wait 
for some time (our prototype uses 25 seconds) before re- 
attempting scaling. 

We make a practical choice here to scale one MB type at a 
time. We view this as a reasonable choice because the scal- 
ing decision for a MB type (and indeed each scaling trial) is 
accompanied by careful placement of scaled instances (Sec- 
tion|5]l and redistribution of load across all MBs in the chain 
(Section |6l). The placement and distribution steps help ad- 
dress network bottlenecks at downstream MBs. 

Nevertheless, it is possible our scaling approach does not 
improve application performance, e.g., when two MB types 
are equally bottlenecked by compute resources. In such cases, 
we use a conservative fall back to the simple scale all ap- 
proach and add new instances for all MB types in the chain. 

Multi-chain Topologies. When a tenant has multiple chains 
in their deployment, we could consider running scaling trials 
in parallel for each chain. However, MB types can be shared 
across chains and thus a scaling trial will influence the out- 
come of other concurrent trials, and result in unnecessary or 
inadequate scaling. 

Another option is to scale each chain sequentially. We use 
this as a starting point, and speed it up by identifying the set 
of overlapping chains for each bottlenecked chain. 

Our approach to scaling in multi-chain topologies is shown 
in lines 10-20 in Figure |7] In the simplest case, if a bottle- 
necked chain shares no MB types, then we simply run the 
single chain scaling procedure as discussed earlier (lines 15- 
16). If one or more MB types overlaps with another chain 
and the overlapping chains are also bottlenecked, then we 
guess that the common MB instances are the bottlenecks and 
only run the scaling trial for these shared MB types (lines 
19-20). On the other hand, if we have overlapping chains 
with no bottlenecks, then we speculate that the MB types 
unique to the current chain are bottlenecked and focus on 
these instead (lines 17-18). The intuition here is that iden- 
tifying shared/isolated chains allows us to zoom in on the 
bottlenecks faster In the case where this heuristic fails to 
improve performance (e.g., chains CI and C2 share MB type 
M that is a bottleneck for CI but not C2) we err on the side 
of caution and adopt a conservative approach and rerun the 
scaling procedure considering the union of MBs across all 
the chains in the set Overlaps 

Network-awareness. Since each scale up/down trial relies 

'This fall back requires a minimal amount of state at the Stratos 
controller to track whether it has recently attempted a scaling trial 
for a given chain. 



on the end-to-end application performance metrics, our ap- 
proach is implicitly network-aware. It may be possible to 
design explicit approaches that combine monitoring CPU, 
memory and I/O resources with utilization of the network 
links used by a tenant's chain. However, it appears difficult 
to precisely identify bottlenecks in such a setting and, more 
importantly, to determine the extent to which they should be 
scaled to meet application performance goals. We leave such 
explicit approaches as a subject for future work. Neverthe- 
less, our evaluation of this implicit scheme shows a lower 
bound on the benefits of network-awareness in scaling (® . 

Since our approach does not rely on VM-level measure- 
ments, it can be applied to tenant deployments with arbitrary 
MBs. In particular, tenants can compose cloud provider- 
offered MBs with those from third-party vendors creating 
diverse chains. 

5. RACK-AWARE PLACEMENT 

The bandwidth available on network links impacts several 
aspects of tenant deployments. Greater available network 
bandwidth on the path to and from an MB means better use 
of the MB's processing functionality. Greater network-wide 
available bandwidth also translates to more effective scaling 
decisions. Together these imply better application perfor- 
mance per unit cost (a function of #MBs in the chain) for 
a tenant. Optimal use of network capacity also allows the 
cloud provider to help elastically scale more tenant chains. 

As such, Stratos incorporates a placement module that 
maximizes the bandwidth available to a chain while also 
controlling the chain's network-wide footprint, even as the 
chain scales elastically. In what follows, we describe algo- 
rithms for two aspects of placement: initially mapping the 
MBs in a tenant's topology, and placing new MB instances. 

5.1 Initial Placement 

Initial MB placement is triggered whenever a new tenant 
arrives, or network-wide management actions occur (e.g., 
VM migration). 

There are two main inputs we use for initial placement: 
(1) The tenant-specified logical chains between MB types 
and application VMs along with the number of physical in- 
stances of MB type or application VM. Edges are annotated 
with the gain/drop factor for each MB instance, which is 
ratio of the net traffic entering the MB versus that leaving 
it. We assume the tenant estimates these based on prior his- 
tory or expected traffic patterns. For example, with an ex- 
pected 50% redundancy in traffic, an RE MB would have a 
gain/drop factor of 2 (compressed traffic entering the MB is 
decompressed). These factors serve as weights to the edges 
in a chain; And, (2) the available slots across different racks 
and available bandwidth of different links in the data center 
topology. The latter is based on historical estimates (e.g., 
mean, maximum or k*'* percentile) of link utilizations. We 
assume a uniform distribution of load across all MBs of the 
same type. 
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While this is a simplistic model, it still forms a helpful 
basis for placement (especially vis-a-vis existing naive VM 
placement schemes that consider individual VMs in isola- 
tion; See ^TOl i. Given this, the placement algorithm has three 
logical stages: 

Partitioning. First, we partition the topology (entire graph 
corresponding to a tenant) with the goal of placing each par- 
tition in its entirety on a single rack so that we incur minimal 
inter-rack communication. That is, we partition the tenant's 
topology into K partitions such that, for each partition, there 
is at least one rack with enough available VM slots to ac- 
commodate the partition. We adapt the classical min-/i -cut 
algorithm ll28l to identify the partitions, starting with K = 1 
and increasing K until all partitions are small enough to be 
accommodated. 

Assigning partitions to racks. The next stage is to assign 
racks for each partition. Here, we use a greedy approach 
that proceeds by sorting pairs of partitions in the decreasing 
order of the inter-partition communication. For each pair, 
if both partitions are unassigned to racks, we find a pair of 
racks with the highest available bandwidth to accommodate 
these two partitions. If one of the partitions in the pair is 
already assigned to a rack, then we simply find a new rack 
for the unassigned partition. (If both are assigned, we simply 
move to the next pair) 

Assigning VMs to slots. Last, we assign VMs (i.e., MBs 
and application VMs) within each partition to slots in the 
racks. In case there is just one slot per (physical) machine, 
we randomly pick a slot and assign it to a VM. If there are 
more available slots, we follow a similar procedure to par- 
tition the VMs so that VMs that communicate more among 
each other can be assigned closer to each other 

5.2 Placing New Middlebox Instances 

New MBs launched after scaling a chain need to be placed 
efficiently for scaling to be effective. Ideally, the new MB 
placement should also help support future scale up for both 
the tenant in question as for other tenants. Our heuristic is 
driven by these goals. 

To more accurately account for the network interaction of 
the scaled MBs, we dynamically track the gain/drop factors 
for MBs in the tenant's topology based on prevalent traf- 
fic patterns at each MB (using EWMA). Placement of the 
scaled MB considers the estimated ratios for the flows from 
MB's input and output VMs (those supplying traffic to and 
receiving from the MB, respectively) as input. Placement 
then works as follows: 

If the new instance can be accommodated in the same rack 
as its input MBs (or VMs) and output MBs (or VMs) then we 
place the new instance in the same rack. However, if the new 
instance cannot be accommodated in the same rack, we se- 
lect a candidate rack (rack with free slots) that has the max- 
imum available bandwidth to the rack for input and output 
MBs. When the input and output MBs are in different racks, 
we consider each candidate rack and estimate the inter-rack 



MB traffic using network-aware flow distribution (discussed 
in the next section), assuming that the new MB is placed in 
the candidate rack. We select the rack that minimizes the 
weighted sum of inter-rack flows (or maximizes the band- 
width available to inter-rack flows). 

6. NETWORK-AWARE 
FLOW DISTRIBUTION 

Akin to placement, Stratos's flow distribution module ac- 
tively manages how MBs use network capacity. In contrast 
with placement, however, flow distribution can be invoked at 
fine time-scales. 

Flow distribution is triggered whenever a scale up/down 
decision is made. In particular, the new instance placement 
heuristic in Section l572l invokes flow distribution when con- 
sidering the optimal location for the scaled instance. Flow 
re-distribution can be triggered whenever the gain factor of 
a MB instance changes significantly (e.g., from 2 to 1 for 
the RE MB in ® ; Stratos periodically monitors each chain 
for such changes. Finally, based on periodic input about net- 
work utilization from the cloud provider's monitoring func- 
tionality, flow re-distribution can be triggered across multi- 
ple tenant chains in response to changes in background (non- 
Stratos) network traffic. This helps maximize the bandwidth 
available for intra-chain communications and improves ten- 
ants' application performance. The latter two re-distribution 
attempts happen at the same periodicity in our prototype. 

In essence, flow distribution helps provide fine-grained 
optimization of chain performance as well as control over 
chain network footprint for a given physical deployment of 
the chain. The key here is that we need to adjust traffic across 
the entire set of chains of a tenant, as focusing just on the 
scaled instance may result in less-than-ideal improvements 
in tenant applications' performance. 



Typel -> Type2 -?Type3 




Total incoming = Totai f(chain=1 ,mboi<=3,mbo!<=S) 

<""floi"9 Cost{mbox=3,mbo>!=8) 

Figure 8: Example tenant topology to explain the terms in 
the LP framework for network-aware distribution. For clar- 
ity, we do not show the gain factors on the edges. 

Next, we describe a systematic linear-programming (LP) 
based framework that formally captures the problem of network- 
aware flow distribution. As such, the logic we describe here 
is general and applies to multiple scenarios in which such 
flow distribution is invoked; for instance, the common case 
is when the distribution module is triggered as a result of 
elastic scaling. The module may also be triggered due to 
changes in the background traffic as well has changes in the 
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gain factors for different MBs in a chain as a result of work- 
load changes for a given tenant. Furthermore, this logic eas- 
ily extends to the multi-tenant scenario with multiple chains 
per tenant; we simply consider the union of all chains across 
all tenants. 

Notation. Let c denote a specific chain and Vc be the to- 
tal volume (flows) of traffic that require processing via this 
chain. There may be different types of MBs (i.e., IDS, RE) 
within a chain; |c| is the number of MBs in a given chain 
c. Let c[j] be the type of the middlebox that is at position 
j in the chain c (e.g., IDS, RE). Let k denote the type of a 
middlebox and Mk be the set of MB instances of type k that 
the scaling module has launched. Thus, M^y] is the set of 
MB instances of type c[j]; we use i € Mcy] to specify that 
a MB instance i belongs to this type. Figure H] gives a quick 
overview of the different entities involved in this formula- 
tion. 

LP Formulation. Our goal is to split the traffic across the 
instances of each type such that: (a) the processing respon- 
sibilities are distributed roughly equally across them and (b) 
the aggregate network footprint is minimized. Thus, we need 
to determine how the traffic is routed between different MBs. 
Let f{c,i,i ) denote the volume of traffic in chain c being 
routed from middlebox i to the instance i (see Figure [8j. 
As a special case, f{c,i) denotes traffic routed to the first 
middlebox in a chain from a source element^ 

Suppose each unit traffic of flowing between a pair of in- 
stances incurs some network-level cost; Cost{i — > i') de- 
notes the network-level cost between two instances. In the 
simplest case, this is a binary variable — 1 if the two MBs 
are in different racks and otherwise. (We can use more ad- 
vanced measures to capture latency or available bandwidth 
as well.) 

Given this setup. Figure |9] formalizes the network-aware 
flow distribution problem that Stratos solves. Here, Eq (H) 
captures the network- wide footprint of routing traffic be- 
tween potential instances of the j"* MB in a chain to the 
j + 1*'' MB in that chain. For completeness, we consider all 
possible combinations of routing traffic from one instance to 
another In practice, the optimization will prefer only com- 
binations that have low footprints. 

Eq (|2]i models a flow conservation principle. For each 
chain and for each position in the chain, the volume of traffic 
entering the middlebox has to be equal to the volume exiting 
it to the next middlebox type in the sequence. Since middle- 
boxes may change the aggregate volume (e.g., a firewall may 
drop traffic or RE may compress traffic), we consider a gen- 
eralized notion of conservation that also takes into account 
the expected gain/drop factor j{c, j) which is the ratio of 
incoming-to-outgoing traffic at the position j for the chain c. 
For initial placement, we expect the tenant to provide these 
factors as annotations to the logical topology specification; 

*For clarity, we focus only on the forward direction of the chain 
noting that our implementation uses an extended formulation that 
captures bidirectional chains as well. 
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Figure 9: LP formulation for the network-aware flow dis- 
tribution problem. The « term in the last equation simply 
represents that we have some leeway in allowing the load to 
be within 10-20% of the mean. 

the tenant could derived these based on expected traffic pat- 
terns or history. Stratos periodically recomputes these gain 
factors based on the observed input-output ratios for each 
chain. 

In addition to this flow conservation, we also need to en- 
sure that each chain's aggregate traffic will be processed; 
thus we also model this coverage constraint in Eq (O. Fi- 
nally, we want to ensure that within each middlebox type, 
the load is roughly evenly distributed across the instances of 
that type in Eq Here, we use a general notion of load 
balancing where we can allow for some leeway; say within 
10-20% of the targeted average load. 

We must ensure that the periodic flow redistributions and 
flow distribution accompanying scaling don't enter into race 
conditions. We take two steps for this; First, any scaling 
attempt in a chain is preceded by a redistribution first. Only 
if redistribution does not suffice does Stratos initial scaling 
trials. Second, Stratos suspends all redistributions during the 
time when scaling trials are being run across a given tenant's 
deployment. 

7. IMPLEMENTATION 

We have implemented a full featured Stratos prototype ca- 
pable of running on commodity x86-64 hardware. Figure [TO] 
shows an overview of the components involved. 

Stratos Data Plane. The Stratos data plane is a configurable 
overlay network, realized through packet encapsulation and 
programmable software switches. Each tenant VM has a 
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Figure 10: Stratos prototype implementation 

pair of virtual interfaces that tap one of two Open vSwitches 
within the host's privileged domain. Packets sent to one of 
the virtual interfaces are transmitted via a GRE tunnel to 
the software switch on the host of the destination VM, from 
whence it is bridged to the appropriate destination interface. 
The other interface is reserved for management traffic. Open 
vSwitch holds the responsibility for encapsulating packets 
for transmission across the network. 

Traffic is directed between the local host and the correct 
destination server using Open vSwitch. A single bridge (i.e., 
switch) on each privileged domain contains a virtual inter- 
face per tenant VM. Forwarding rules are matched based on 
the switch port on which it arrived, the final destination of 
the packet, and a tag stored in the IP Type of Service (TOS) 
field. Using tags reduces the number of flow entries in the 
switches, providing an important performance boost. For- 
warding rules are installed by the central Stratos controller. 

Stratos Controller. The Stratos controller is implemented 
as an application running atop Floodlight 161 and interfaces 
with the Open vSwitch instances using the OpenFlow pro- 
tocol [27 1 . The controller application takes a logical topol- 
ogy as input, which defines the tenants chains and the VM 
instances of each client/server/MB in the chains. The con- 
troller transforms this topology into a set of forwarding rules 
which are installed in the Open vSwitch instances in each 
physical host. The controller also gathers performance met- 
rics from network switches, application end-points and MBs 
using SNMP. These inputs are using in the rest of the mod- 
ules in the controller, namely, those for scaling, placement 
and flow distribution. Our controller launches and termi- 
nates VMs using Xen ifTSl . 

8. EVALUATION 

We evaluate Stratos in three different ways: First, we con- 
duct controlled testbed experiments using our prototype to 
examine in detail the benefits of different components of 
Stratos- application-aware scaling, placement and load dis- 
tribution. Second, we run a modified version of our proto- 
type on EC2 to understand the performance of Stratos in a 
dynamic scenario. Since EC2 does not provide control over 
placement, this prototype can only perform network-aware 
scaling and load distribution. Finally, we simulate Stratos to 
understand the benefits of Stratos at scale. 



There are three dimensions in our evaluation: (1) Choice 
of scaling approach: leveraging CPU and memory utiliza- 
tion at a MB to determine if it is a bottleneck {threshold), vs 
using application-aware scaling {aware) (2) Placement: ran- 
domly selecting a rack {rand) or using our network-aware 
placement {aware); (3) Flow distribution: either uniform or 
network-aware flow distribution. We assume that both ini- 
tial and scaled instance deployment use identical placement 
and load distribution schemes. 

We study a variety of metrics: the effectiveness of scal- 
ing decisions both in terms of when they are triggered and 
how many MBs are used, the throughput of tenant applica- 
tions, unmet demand, and utilization of MBs and provider's 
infrastructure. 

8.1 Controlled Testbed Experiments 

Our testbed consists of 24 machines, with 3 VM slots 
each, deployed uniformly across 8 racks. The Stratos con- 
troller runs on a seperate, purpose specific machine. Unless 
otherwise specified, we consider a single tenant whose logi- 
cal topology is a single chain consisting of client, an RE MB, 
an IPS MB (standalone throughputs of 240 and 80Mbps, re- 
spectively), and servers. The RE and IPS MBs use Click lfT6l 
and Suricata 1.1.1 ||T3l , respectively. 

We build a multi-threaded workload generator that works 
between a client-server pair in the following manner: the 
threads running at a client share a (sufficiently large) token 
bucket that fills at a rate specified by a workload pattern (e.g ., 
steady, increasing, or sine-wave). A client thread draws a 
single token from the bucket prior to initiating a connection 
to the server; if none are available, it blocks. New connec- 
tions are issued by a client only after the previous connection 
finishes and another credit has been obtained. The number 
of outstanding tokens indicates the unmet demand, and each 
token corresponds to a request of 100KB. 

We impose background traffic in our experiments by run- 
ning our workload generator ("steady" pattern) across spe- 
cific pairs of MBs in our testbed. We experiment both with 
fixed and variable background traffic patterns; we focus largely 
on results for the former for brevity. 

Overall benefits. We ran Stratos atop the testbed using a lin- 
early increasing workload pattern. Background traffic was 
fixed at such a rate that utilization of the aggregation links 
in our topology varied from 25 to 50%. Figure [TT]shows an 
execution of Stratos, which we describe as aware / aware / 
aware, meaning that scaling is initiated in response to ap- 
plication demand, and that MB placement and flow distribu- 
tion are both network-aware. We first compare it against a 
completely network-agnostic approach, labeled threshold / 
rand / uniform, wherein scaling decisions are entirely based 
on CPU load exceeding 80 percent for a period of five sec- 
onds. From Figure [TTJa), we note that the naive approach's 
throughput starts to drop at around 300s, when the unmet 
demand skyrockets. In contrast, Stratos has sustained high 
throughput (measured in requests per second per process, 
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Figure 11: Number of MBs used (a 
and unmet demand (b - bottom) 



while nine processes execute concurrently) and no signif- 
icant unmet demand. Figure fTTT b) shows the correspond- 
ing scaling decisions. We see that Stratos uses 2X fewer 
instances than the naive threshold / rand / uniform approach, 
yet it offers better throughput. However, comparing the fig- 
ures describing Stratos's scaling behavior with correspond- 
ing demand graphs, it is apparent that Stratos's ability to 
scale to meet increasing demand is unhindered by its initial 
economy of MB allocation . 

Next, we attempt to tease apart the relative contribution of 
the three network-aware components in Stratos. 

Application-aware scaling benefits. Figure[TTlb) also shows 
the number of MB instances used by two other schemes: 
threshold / aware / aware and aware / rand / uniform. Taking 
all the four schemes into account together, we notice that the 
application-aware scaling heuristic outperforms naive scal- 
ing (aware/* versus threshold/*), using nearly 2X fewer in- 
stances. In terms of throughput, we noticed that aware / 
aware / aware is about 10% better than threshold/aware/aware, 
whereas aware / rand / uniform is actually about 10% lower 
in throughput than threshold /rand / uniform (results omitted 
for brevity). 

Taken together, these results indicate that, while the application- 
aware scaling heuristic helps scale the appropriate MBs, re- 
sulting in fewer MBs being used, it critically relies on place- 
ment and load-balancing to be network aware in order to 
make effective use of MB capacity and to offer optimal application 
level performace. We explore the role of placement and load 
balancing in more detail next. 

Placement. We first understand the impact of network-aware 
placement decisions in Stratos. We run Stratos and aware / 
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top) and throughput Figure 12: Effect of placement decisions (a - top) on 
throughput and unmet demand (b - bottom) with fixed back- 
ground traffic. Unmet demand is shown using dashed lines. 



rand / aware against the same fixed background traffic and 
workload. 

We compare the two schemes' performance against this 
workload. The results are shown in Figure[T2](a). We imme- 
diately see that aware / rand / aware attempts to scale signif- 
icantly more frequently than Stratos, and that those attempts 
usually fail. As is shown by Figure [12] (b), these attempts to 
scale up are the result of spikes in unsatisfied demand, which 
require multiple scaling attempts to accommodate. 

By contrast, it is apparent from these figures both that 
Stratos needs to attempt to scale much less often, and that, 
when it does, those attempts are significantly more likely to 
be successful. 

Flow Distribution. We next understand the impact of network- 
aware flow distribution in Stratos. As before, we run Stratos 
and aware / aware / uniform against the same background 
traffic and workload so as to ascertain their behavioral dif- 
ferences. 

We see that, in order to satisfy the same demand, aware 
/ aware / uniform requires more middlebox instances than 
Stratos. More significantly, though, we see Stratos is nonethe- 
less better situated to respond to surges in demand; it is able 
to satisfy queued requests quicker, with less scaling, and 
with less "turbulence" in subsequent traffic. 

Although these results employ a small scale testbed with 
"synthetic traffic patterns, they serve to highlight the impor- 
tance of the individual components of Stratos. Specifically, 
making any one component network-agnostic results in us- 
ing more MBs than necessary, poor throughput and substan- 
tial buildup of unmet demand. We also experiments with 
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Figure 13: Effect of flow distribution decisions on scaling 
(a - top) and on demand satisfaction (b - bottom) with fixed 
background traffic. Unmet demand is shown using dashed 
lines. 

variable background traffic, different workload patterns and 
found the above observations to hold qualitatively. We pro- 
vide further evidence using our EC2 prototype and simula- 
tions. 

8.2 (Restricted) Stratos in a Dynamic Scenario 

Prototype details. Our EC2 prototype is similar to our full- 
fledged prototype minus network-aware placement. Instead 
we rely on EC2 to place any and all MBs; this is something 
we cannot control. To enable network-aware load distri- 
bution, we periodically collect available bandwidth using a 
packet-pair-based measurement tool lISTI between adjacent 
MBs in a tenant's deployment. 

Multi-chain tenant deployment. Whereas the previous ex- 
periments used a simple chain, we now have the tenant de- 
ploy the multi-chain setup shown in Figure |5] Each client 
VM runs httperf [7] to request a 50KB file from a corre- 
sponding server VM running Apache (thus, client A requests 
from server A). We deploy each MB as a small EC2 instance 
to emulate bottlenecks; client, server, and tagger are large 
instances; the controller runs on a micro instance. A client 
requests a 50KB file from a server running Apache; each is a 
large EC2 instance. We mark a chain as being bottlenecked 
if there is a sustained unmet demand of 2.8 Mbps for a pe- 
riod of at least 20 seconds. We use a 25 second gap between 
scaling trials, and we use a 2 Mbps improvement threshold 
to retain an instance. 

EC2 Setup Latency. We first measure the setup overhead 



Task 


Time 


Logical-to-Physical 

Data Plane Setup (Create Tunnels) 

Data Plane Config (Install Rules in Open vSwitch) 


5ms 
2.4s per VM 
3ms per VM 



Table 1 : Stratos setup latency 
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Figure 14: Multiple chain scaling 

associated with Stratos. The setup cost includes the time re- 
quired to launch the data plane components (taps and switch) 
on each VM, transform the logical chains into per-VM con- 
figurations, and configure each VM's data plane components 
(Table [T]). The total setup time for our example chain (with 
one instance of each MB) is «12s (high because EC2 does 
not allow parallel deployment/setup of VMs). Relative to the 
time to launch a VM (on the order of few tens of seconds), 
this represents a small overhead. 

Effectiveness of Scaling. To emulate induced bottlenecks 
in the shared (X, Y) or unshared (W, Z) MBs (See Figure|5) 
we use artificial Click [[25 i MBs that rate limit packets at 
5.5K, 9K, 7K, and lOK packets/second for instances of W, 
X, Y, and Z, respectively. We impose an initial demand of 
16Mbps on each chain, increasing demand by 4Mbps ev- 
ery 2 minutes. Figure [14] shows the scaling result and the 
application performance. The shared MBs become bottle- 
necked first, because they incur load from both clients. Our 
heuristic accurately attempts to scale these MBs first; it does 
not attempt to scale the unshared MBs because the bottle- 
neck is eliminated by first adding two instances of Y and 
then an instance of X. When demand increases to 36Mbps 
on each chain, W becomes a bottleneck for Chain 1, which 
our heuristic rightly scales, without conducting unnecessary 
scaling trials for X, Y, or Z. 

Our approach ensures that application demand is entirely 
served most of the time. No gap between demand and served 
persists for longer than 60 seconds. Without our extension, 
chains would need to be scaled sequentially, increasing the 
duration of these gaps. For example, the gap at 240s would 
persist for an additional 25s, while an unnecessary scaling 
trial was conducted with W prior to scaling trials with X and 
Y. 

Effectiveness of Flow Distribution. We now evaluate the 
benefits of network-aware flow distribution. We compare 
uniform and network-aware flow distribution for a single 
point in the scaling space — 3 RE and 4 IPS — for the sin- 
gle chain. The MB instances are clustered into two groups 
limiting the flow of traffic between the groups to 12K pack- 
ets per second. Application demand starts at 60Mbps and 
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Figure 15: Application goodput with uniform and network- 
aware flow distribution at a fixed level of scaling 

increases by 10Mbps every 2 minutes. 

Figure [15] compares the percent of application demand 
served under the two distribution mechanisms. We observe 
that the same set of MBs is able to serve higher demand 
when network-aware flow distribution is employed: with a 
demand of 100Mbps, 90% is served under network-aware 
distribution versus only about 75% with uniform distribu- 
tion. (The consistent 5% of unserved demand with network- 
aware distribution is a result of EC2 network variability be- 
tween our runs, which further highlights the need for a Stratos- 
like approach for simplifying MB management.) 

8.3 Simulations: Stratos at Scale 

Simulation setup. We developed a simulator to evaluate the 
macroscopic benefits of Stratos at large scales. While we 
examined complex scenarios using the simulator, we present 
results using somewhat restrictive setups for clarity. Specif- 
ically, for the scenarios below, the simulator takes as input: 

(1) a data center topology consisting of racks and switches, 

(2) the number of tenants, (3) chain with elements and initial 
instances (all tenant use the same deployment pattern), and 
(4) a fixed application demand (in Mbps) common across 
tenants. 

We run our simulator to place 200 tenants within a 500- 
rack data center We run the network-aware scaling heuristic 
for each tenant runs until the tenant's full demand is satisfied 
or no further performance improvement can be achieved. 
The data center is arranged in a tree topology with 10 VM 
slots per rack and a capacity of IGbps on each network link. 
All tenants use the same deployment — a simple chain con- 
taining clients (3 instances), MB-typel (2), MB-type2 (1), 
MB-type3 (2), and servers (4) — which initially consists of 
12VMs; thus every tenant is forced to spread her VMs across 
racks. The capacity of each instance of the MB-typel, type2 
and type3 is fixed at 60, 50, and 1 10Mbps, respectively. The 
application demand between each client and server pair is 
100Mbps, for a total traffic demand of 300Mbps. We as- 
sume intra-rack links are very high capacity. 

First, we look at the tenant demand that can be served 
under different combinations of placement and flow distri- 
bution during scaling (Figure [T6l a)): we assume all tenant 
deployments are initially placed in a network-aware fashion. 
We observe immediately that aware placement/aware distri- 
bution is the best, in that a greater fraction of the demand 
can be served across all tenants than then remaining com- 
binations. At the other extreme, random placement coupled 



with uniform distribution results in less than 30% of demand 
served across all tenants. The other possibilities offer inter- 
mediate performance as expected, with random/aware out- 
performing aware/uniform; this indicates the relative impor- 
tant of network-aware load distribution compared to network 
aware placement of scaling instances (note that all chains 
initially are placed in a network-aware fashion). 

Performance per $. Tenants are currently charged based 
on the number of instances provisioned. Thus it is crucial 
that tenants maximally utilize their MB instances. Because 
Stratos actively managed MB interactions, it helps improve 
the bandwidth available between successive MBs in a de- 
ployment, thereby helping MB resources to be used more ef- 
fectively. We illustrate the benefits of this next. FigurefTSI'b) 
presents a CDF of the amount of traffic served for each ten- 
ant relative to the number of instances deployed. Aware dis- 
tribution results in a significant increase in the amount of 
traffic served per-instance for the median tenant with both 
placement algorithms: 8MBps with aware placement and 
2MBps with rand. As before we again see the greater im- 
protance of network-aware load distribution relative to place- 
ment. 
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Figure 16: Tenant load served (a - top) and Traffic served 
divided by number of instances (b - bottom) 

Provider view. Figure [T7]presents a CDF of the amount of 
inter-rack traffic generated by each tenant's chain. Interest- 
ingly, tenants cause a high percent of the data center's net- 
work to be utilized with the aware placement and load distri- 
bution. This is because when both network aware placement 
and load distribution are used, tenants are able to scale out 
more and more closely match their demand, thereby pushing 
more bytes out into the data center network. One the whole, 
the data center infrastructure is more effectively utilized. 

8.4 Summary of Key Results 

Our key findings are that: 
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• Stratos helps optimally meet application demand by 
accurately identifying and addressing bottlenecks. In 
contrast, network-agnostic approaches use up to 2X as 
many MBs as Stratos, yet, they have severely back- 
logged request queues. 

• All three network-aware components of Stratos are cru- 
cial to extracting the ideal overall benefits of Stratos. 

• Even without intrinsic support for placement, Stratos 
can elastically meet the demands of applications in EC2. 
Stratos's fine-grained load distribution plays a crucial 
role in sustaining application performance despite chang- 
ing network conditions. 

9. DISCUSSION 

Integration of Stratos with MBs. Stratos can be improved 
overall by having it be aware of MB functions. For example, 
if Stratos knows the duplication patterns in specific traffic 
flows, then it can use this to more carefully decide which 
flows to send to specific replicas of a redundancy elimination 
MB. MBs can benefit from knowing about Stratos too; e.g., 
a server load balancer can use the network load distribution 
patterns imposed by Stratos, together with server load, in 
deciding how to balance requests across servers. 

Failure Resilience. Our placement hueristics are performance- 
centered and hence they impose rack-aware allocations. How- 
ever, this may not be desirable for tenants who want their 
deployments to be highly available. Our placement heuris- 
tics can be adapted for such tenants to distribute VMs across 
racks for availability reasons, while also minimizing net- 
work footprint. The simplest extension is to modify the map 
of available VM slots such that there is at most one slot avail- 
able per machine or one per rack for a given tenant. 

Zero Downtime. As mentioned in Section [3] when a col- 
lection VMs are ready to be migrated, re-placement may 
be invoked across several tenant deployments (even those 
who VMs are not among the set being migrated) to find 
new globally-optimal allocations. There is a concern that 
this may impose down-time on tenant deployments, because 
their active traffic flows may either have to be suspended or 
they may be lost in the transition. To minimize such network 
downtime, we can leverage support mechanisms available to 
clouds today, e.g., VMWare's VDirector that tunnels pack- 
ets to the VMs' old locations to be either buffered temorarily 
or forwarded along to the new locations (when the VMs are 
ready to receive traffic but before network routing changes 



have kicked in). 

10. RELATED WORK 

Networked Services in the Cloud. Recent proposals 19] |5] 
fT4irT9l and third party middleware llT4l have begun to incor- 
porate limited support for middleboxes. CloudNaaS llT9l . 
CloudSwitch [5J and VPNCubed IHl aim to provide flex- 
ible composition of virtual topologies; however, they don't 
have the mechanisms for scaling of networked services. Em- 
brane O uses a proprietary framework that allows for the 
flexible scaling of networked services. However, it is limited 
to provider-offered middleboxes, and does not allow com- 
posing them with each other or with third-party MBs. 

Studies have looked at the properties of clouds that impact 
application performance ||37] |26l and that affect application 
reliability ll36l . Others have sought to enrich the networking 
layer of the cloud by adding frameworks that provide control 
over bandwidth lfT7ll23l . security ll20l|29l , and performance 
of virtual migration ll38l . These are largely complementary 
to Stratos. 

Split/Merge explores techniques that allow control over 
MB state so that MBs can be scaled up or down for elastic 
execution 1301 . However they do not consider MB composi- 
tion, the issue of what triggers scaling, and how to manage 
the network interactions of the MBs during and after scaling, 
which form the focus of our work. That said, Split/Merge 
and Stratos are complimentary to each other 

Middleboxes in Enterprises and Datacenters. Issues in 
deployment and management of middleboxes have been ex- 
amined in the context of enterprise ll33l and data-center ll24l 
networks. But the focus is on composition in physical in- 
frastructures and thus performance challenges introduced by 
the lack of tight control in clouds are not addressed. 

VM Placement. Oversubscription within current data cen- 
ter networks and its impact on application performance and 
link utilizations have been widely studied l37l l26l [TSl . Re- 
cent works |[T9l l28l have explored using VM placement as a 
solution to this problem. In comparison with prior schemes, 
which focuses on placing individual VMs in isolation, we 
focus on discovering groups of related VMs with dense com- 
munication patterns and colocating them. 

Scaling. Recent studies have considered the problem of scal- 
ing the number of virtual machines in each tier of a tenant's 
hierarchy 134] |2] [TT]. All of them rely on CPU utihzation, 
which we have shown to be insufficient. 

11. CONCLUSIONS 

Enhancing application deployments in today's clouds us- 
ing virtual middleboxes is challenging due to the lack of net- 
work control and the inherent difficulty in intelligently scal- 
ing middleboxes while taking network effects into account. 
Overcoming the challenges in a systematic way requires a 
new ground-up framework that explicitly manages the net- 
work configuration and network interactions of MBs. To this 
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end, we presented the design, implementation, and evalua- 
tion of a network-aware orchestration layer for MBs, called 
Stratos. Stratos allows tenants to specific complex deploy- 
ments using a simple logical topology abstraction. Then, the 
key components of Stratos- an application-aware scheme for 
scaling, rack-aware placement and network-aware flow dis- 
tribution - work in concert to carefully manage network re- 
sources at various time scales while elastically scaling en- 
tire tenant MB deployments to meet application demands. 
We conduct a thorough evaluation using a testbed, deploy- 
ment based on EC2 and large scale simulations to show that 
Stratos helps tenants make more efficient scaling decisions, 
that all three network-aware components of Stratos are es- 
sential, tenant applications make more effective us of MBs 
and providers' infrastructures are more effectively used. 
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